Closed Bug 1284511 Opened 8 years ago Closed 8 years ago

Slow script/Unresponsive Script dialogs appear frequently when resuming app

Categories

(Core :: JavaScript Engine, defect)

50 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox49 blocking fixed
fennec 50+ ---
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: snorp, Assigned: arai)

References

Details

(Keywords: regression)

Attachments

(6 files)

I've noticed recently I get a 'slow script' dialog frequently if I switch back to Fennec after being in another app. Seems to happen no matter what sites I have open. Not reproducible.
See Also: → 1283633
tracking-fennec: --- → ?
:bkelly says this might be because we throttle timers[0] on mac/win but not on Android. Apparently we need to send some stuff on NS_WIDGET_SLEEP_OBSERVER_TOPIC[0] when sleeping/resuming.

[0] https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#441
[1] https://dxr.mozilla.org/mozilla-central/search?q=NS_WIDGET_SLEEP_OBSERVER_TOPIC
Assignee: nobody → snorp
OK all of that is super old so maybe unrelated
I saw this on Windows 10 laptop after sleep, though, so doesn't seem to be Fennec-specific. Moving to JS for lack of a better place.
Component: General → JavaScript Engine
Product: Firefox for Android → Core
Jim, Brad, here's another recent slow script related bug. Related to the e10s work we did for slow script dialogs?
Comment #4
tracking-fennec: ? → 50+
Flags: needinfo?(jmathies)
Flags: needinfo?(blassey.bugs)
This is in the fennec product? I don't see how our e10s related work for desktop plays into this.
Flags: needinfo?(jmathies)
Bug 1276626 looks likely.

:arai, what do you think?
Flags: needinfo?(arai.unmht)
Input from :mrbkap would be useful as well (see comment #7 and #8)
Flags: needinfo?(blassey.bugs) → needinfo?(mrbkap)
Triggered a build with logging.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c901052c3da3

will test once the build finishes.
okay, I got it.
the algorithm I used was wrong.

we need to calculate "how many times the half timeout happens" and "how long time elapsed from last RecordRuntimeActivity" separately.
will prepare a patch.
The bug in previous approach was that we increment mSlowScriptSecondHalfCount only by 1 even if long time elapsed, and it hits timeout everytime we check in watch dog, until RecordRuntimeActivity is called.

Then, I tried to increment mSlowScriptSecondHalfCount with proper amount, but I bumped into another issue:
if timeout is 10 seconds, we check every 5 seconds.
if 99.9 seconds elapsed without RecordRuntimeActivity because of sleep,
we would wait only extra 0.1 seconds before invoking interrupt, even if we increment mSlowScriptSecondHalfCount properly.

So, I changed the approach.
It would be simpler to record the last TimeSinceLastRuntimeStateChange value, and check the elapsed time from the that, and flip mSlowScriptSecondHalf every timeout.

This is the almost same approach that was used before bug 1276626 patch, in XPCJSRuntime.
  https://hg.mozilla.org/mozilla-central/rev/8272530c90ef
Assignee: snorp → arai.unmht
Status: NEW → ASSIGNED
Flags: needinfo?(arai.unmht)
Attachment #8787275 - Flags: review?(mrbkap)
This was deemed blocking for 49 at the channel meeting today. Marking it as such.
It might be better just backing out bug 1276626 patch, at least for 49, as the case fixed by the patch should be less critical than this.
Blocks: 1276626
Keywords: regression
OK. It looks like the fix in 1276626 was for a specific case. Maybe we can backout as you suggest. The risk from my point of view is that, since the fix in that bug landed 3 months ago, in the time in between, we have landed more patches from something like 6000 other bugs.  So, I don't know how to tell quickly if something else depends on the work in bug 1276626.   We can try the backout, for the RC build and release on Tuesday, then we have the rest of that week to uncover any regressions.
Attachment #8787275 - Flags: review?(mrbkap) → review+
Flags: needinfo?(mrbkap)
try run on beta branch with backout bug 1276626 patch:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=40c0efa59bb0

will land the attachment 8787275 [details] [diff] [review] patch shortly to m-i, after test,
and also trigger another try run on beta branch with the patch.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2872d7dca33b68a7c49adffeb2b671c28820ac60
Bug 1284511 - Fix slow script check to use elapsed time from last check. r=mrbkap
so far I don't see any test regression on beta tries.
can anyone test with actual case (laptop or android sleep) on those binaries?  or next nightly?
or, can anyone provide the steps to reproduce?
I cannot reproduce the issue on Windows 10 laptop with 49.0b8.
(I opened treeherder in 2 tabs and tries sleep with ~1 minutes several times)

snorp, do you know?
Flags: needinfo?(snorp)
https://hg.mozilla.org/mozilla-central/rev/2872d7dca33b
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
for now, attaching backported patches.
I'm still trying to reproduce the issue, but no luck...
Attachment #8787972 - Flags: review+
Summary: Slow script dialogs appear frequently when resuming app → Slow script/Unresponsive Script dialogs appear frequently when resuming app
See Also: → 1286739
Comment on attachment 8787972 [details] [diff] [review]
(mozilla-aurora) Fix slow script check to use elapsed time from last check. r=mrbkap

for now, requesting uplift.
once someone verified the fix, I will rewrite the uplift request message.


Approval Request Comment
> [Feature/regressing bug #]
Bug 1276626.

> [User impact if declined]
Slow script dialog may be shown after sleep

> [Describe test coverage new/current, TreeHerder]
This needs manual test, as it's related to sleep, and it cannot be done in automated tests.

> [Risks and why]
Medium, as this is not yet tested on the env that hits the issue.
I cannot reproduce the issue locally with affectes builds (aurora/nightly).

> [String/UUID change made/needed]
None
Attachment #8787972 - Flags: approval-mozilla-aurora?
Comment on attachment 8787973 [details] [diff] [review]
(mozilla-beta) Fix slow script check to use elapsed time from last check. r=mrbkap

Approval Request Comment
> [Feature/regressing bug #]
Bug 1276626.

> [User impact if declined]
Slow script dialog may be shown after sleep

> [Describe test coverage new/current, TreeHerder]
This needs manual test, as it's related to sleep, and it cannot be done in automated tests.

> [Risks and why]
Medium, as this is not yet tested on the env that hits the issue.
I cannot reproduce the issue locally with affected builds (aurora/nightly).

> [String/UUID change made/needed]
None
Attachment #8787973 - Flags: approval-mozilla-beta?
Comment on attachment 8787973 [details] [diff] [review]
(mozilla-beta) Fix slow script check to use elapsed time from last check. r=mrbkap

OK, let's try with this patch, taking it to 49 rc 1
FYI, we will have to do a second rc anyway for bug 1297687
Attachment #8787973 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8787972 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I don't have solid STR. I tried for a while to find some, but wasn't able to come up with anything.
Flags: needinfo?(snorp)
We should be able to use telemetry to figure out if the patch helps or not.
*Almost* every time I come back from sleep on Windows 10 running Aurora I see a slow script dialog for IRCCloud in a pinned tab. Weak STR but maybe useful.
(In reply to Andrew Overholt [:overholt] from comment #30)
> *Almost* every time I come back from sleep on Windows 10 running Aurora I
> see a slow script dialog for IRCCloud in a pinned tab. Weak STR but maybe
> useful.

I've noticed more frequent slow script dialogs too on Aurora, often in the browser/base/content/browser.js script. I wasn't doing anything and Firefox wasn't being slow or anything, so it looked like a false positive.
Thanks.
now I'm testing on IRCCloud, but so far I don't see the dialog.
Does the issue still happen on today's aurora?
Just came up for me on 49 RC1 on a gmail tab after resuming from sleep with this error: Script: https://plus.google.com/_/scs/…cCMjOvovKNJ3foLAYjuM34xrVTC2sw:114 .  I don't particularly use google plus. But I think this is from the little red g+ notification number in the upper corner of gmail. Some sort of push notification thing?
should we try backing out whole patches (this bug and bug 1276626) then?
this bug's patch should be able to be backed out cleanly.
Here are patches that backout bug 1276626 patch.
Comment on attachment 8789180 [details] [diff] [review]
(mozilla-aurora) Backed out changeset 8272530c90ef (bug 1276626)

about mozilla-central, I'll land backout patches shortly.


asking approval for backing out bug 1284511 patch and this bug's patch:
  https://hg.mozilla.org/releases/mozilla-aurora/rev/37c314d53f82
  https://hg.mozilla.org/releases/mozilla-aurora/rev/8272530c90ef

this patch backs out 8272530c90ef.
37c314d53f82 should be able to be backed out cleanly.


Approval Request Comment
> [Feature/regressing bug #]
bug 1284511 and this bug
bug 1284511 caused the issue, tried a fix on this bug, but it didn't work.

> [User impact if declined]
Slow script dialog may be shown after sleep

> [Describe test coverage new/current, TreeHerder]
This needs manual test, as it's related to sleep, and it cannot be done in automated tests.

> [Risks and why]:
Medium, as this is not yet tested on the env that hits the issue.
I still cannot reproduce the issue locally with affected builds.

> [String/UUID change made/needed]
None
Attachment #8789180 - Flags: approval-mozilla-aurora?
Comment on attachment 8789181 [details] [diff] [review]
(mozilla-beta) Backed out changeset 8272530c90ef (bug 1276626)

also, asking approval for backing out bug 1284511 patch and this bug's patch:
  https://hg.mozilla.org/releases/mozilla-beta/rev/9aa5886a1563
  https://hg.mozilla.org/releases/mozilla-beta/rev/8272530c90ef

this patch backout 8272530c90ef.
9aa5886a1563 should be able to be backed out cleanly.


Approval Request Comment
> [Feature/regressing bug #]
bug 1284511 and this bug
bug 1284511 caused the issue, tried a fix on this bug, but it didn't work.

> [User impact if declined]
Slow script dialog may be shown after sleep

> [Describe test coverage new/current, TreeHerder]
This needs manual test, as it's related to sleep, and it cannot be done in automated tests.

> [Risks and why]:
Medium, as this is not yet tested on the env that hits the issue.
I still cannot reproduce the issue locally with affected builds.

> [String/UUID change made/needed]
None
Attachment #8789181 - Flags: approval-mozilla-beta?
(In reply to Andrew Overholt [:overholt] from comment #30)
> *Almost* every time I come back from sleep on Windows 10 running Aurora I
> see a slow script dialog for IRCCloud in a pinned tab. Weak STR but maybe
> useful.

+1 to these STR from Andrew. In my case though, on RC1, I have yet to see the slow script dialog pop up. So I for one feel the change we landed in RC1 worked, especially on win10.

Are we certain everyone is testing against RC1 and not the earlier builds?
I am getting the slow script warning in gmail and google docs after resuming from sleep (Mac, 10.11, with the RC2 build today) But I am just one person so consider that anecdotal, and I can't reproduce it every time. 
 
I think Brad was judging from the Nightly telemetry data that the fix in nightly didn't decrease the slow script reports.
Comment on attachment 8789180 [details] [diff] [review]
(mozilla-aurora) Backed out changeset 8272530c90ef (bug 1276626)

Let's uplift the backout to aurora as well, and see if we get any useful user feedback. And, we should know more from telemetry on nightly by Monday.
Attachment #8789180 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I don't know if this is related, but the Fidelity.com website is so slow, it's been rendered virtually unusable in v. 49 beta. It works fine in ESR and is a bit better in Nightly.
In relation to my previous comment, I'm talking about after one logs in. Ummmmm, sorry, but I'm not going to send you my log-in data so you can test it. 
;-)

I don't, however, get the message that a script is slowing down Fx. It's just that the elements take forever to load and every time I try to visit the page, at one point or another, I get the message in the title bar (which eventually goes away) "Program not responding".

By the way, I'm using the very latest version of 49 beta (07 Sep).
Sorry, I should have mentioned this earlier (I still find it stupid that there's no way to edit comments): With a new profile it works fine, but I've removed No Script and UBlock Origin, and this hasn't helped.
Is bug 1294782 related to this? It's not from the resuming though, just put Firefox on background for a while and I will receive unresponsive script error.
After discussion about telemetry results with chutten, I think we will go ahead and try uplifting the backout for the RC3 build. He will post more details soon.
Comment on attachment 8789181 [details] [diff] [review]
(mozilla-beta) Backed out changeset 8272530c90ef (bug 1276626)

[Triage Comment]

Let's bring the backout to beta and release for the RC3 build.
Attachment #8789181 - Flags: approval-mozilla-release+
Attachment #8789181 - Flags: approval-mozilla-beta?
Attachment #8789181 - Flags: approval-mozilla-beta+
At :lizzard's and :bsmedberg's behest, I analyzed [1] the current data we have on the probe SLOW_SCRIPT_NOTICE_COUNT (which counts the number of times a user sees the slow script dialog). It is a comparison between nightly 0910 and nightly 0903 (Saturdays, to maintain day-of-week volume comparability).

This analysis is as-of-yet unreviewed, but shows data consistent with what's seen on telemetry.mozilla.org[2]. Specifically, that the number of reports (and clients submitting those reports) of slow script notices has dropped to 20% of previous figures. I suspect not all pings and clients have reported in for the build, but assuming that the pings and clients are representative, we can say the following about rates:

There are now more pings from fewer clients reporting more notices. Or, to put it colloquially, it seems as though we have traded a nightly where some people had poor experiences for a nightly where few people had hideous experiences.

All that with the needle on the notices-per-usage-hour barely moving at all.

This seems to be a return of this measure to the values and volumes experienced earlier this year until the beginning of June, which is consistent with the patchset backed out.

[1]: https://gist.github.com/chutten/feb43ea722a7a0dcacfe399ba03a1ffe
[2]: https://mzl.la/2cHTHfX
(In reply to Chris H-C :chutten from comment #54)

> There are now more pings from fewer clients reporting more notices. Or, to
> put it colloquially, it seems as though we have traded a nightly where some
> people had poor experiences for a nightly where few people had hideous
> experiences.

Thanks for all the details! Is there a bug already filed for helping those people?
Flags: needinfo?(chutten)
(In reply to David Bolter [:davidb] from comment #57)
> (In reply to Chris H-C :chutten from comment #54)
> 
> > There are now more pings from fewer clients reporting more notices. Or, to
> > put it colloquially, it seems as though we have traded a nightly where some
> > people had poor experiences for a nightly where few people had hideous
> > experiences.
> 
> Thanks for all the details! Is there a bug already filed for helping those
> people?

I think bug 1276626 may be that bug, as it was the effort to try and improve what was thought to be the cause.
Flags: needinfo?(chutten)
See Also: → 1306683
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: